16 research outputs found
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Towards Understanding Cyberbullying Behavior in a Semi-Anonymous Social Network
Cyberbullying has emerged as an important and growing social problem, wherein
people use online social networks and mobile phones to bully victims with
offensive text, images, audio and video on a 247 basis. This paper studies
negative user behavior in the Ask.fm social network, a popular new site that
has led to many cases of cyberbullying, some leading to suicidal behavior.We
examine the occurrence of negative words in Ask.fms question+answer profiles
along with the social network of likes of questions+answers. We also examine
properties of users with cutting behavior in this social network
The diminishing state of shared reality on US television news
The potential for a large, diverse population to coexist peacefully is
thought to depend on the existence of a ``shared reality:'' a public sphere in
which participants are exposed to similar facts about similar topics. A
generation ago, broadcast television news was widely considered to serve this
function; however, since the rise of cable news in the 1990s, critics and
scholars have worried that the corresponding fragmentation and segregation of
audiences along partisan lines has caused this shared reality to be lost. Here
we examine this concern using a unique combination of data sets tracking the
production (since 2012) and consumption (since 2016) of television news content
on the three largest cable and broadcast networks respectively. With regard to
production, we find strong evidence for the ``loss of shared reality
hypothesis:'' while broadcast continues to cover similar topics with similar
language, cable news networks have become increasingly distinct, both from
broadcast news and each other, diverging both in terms of content and language.
With regard to consumption, we find more mixed evidence: while broadcast news
has indeed declined in popularity, it remains the dominant source of news for
roughly 50\% more Americans than does cable; moreover, its decline, while
somewhat attributable to cable, appears driven more by a shift away from news
consumption altogether than a growth in cable consumption. We conclude that
shared reality on US television news is indeed diminishing, but is more robust
than previously thought and is declining for somewhat different reasons
Evaluating the scale, growth, and origins of right-wing echo chambers on YouTube
Although it is understudied relative to other social media platforms, YouTube
is arguably the largest and most engaging online media consumption platform in
the world. Recently, YouTube's outsize influence has sparked concerns that its
recommendation algorithm systematically directs users to radical right-wing
content. Here we investigate these concerns with large scale longitudinal data
of individuals' browsing behavior spanning January 2016 through December 2019.
Consistent with previous work, we find that political news content accounts for
a relatively small fraction (11%) of consumption on YouTube, and is dominated
by mainstream and largely centrist sources. However, we also find evidence for
a small but growing "echo chamber" of far-right content consumption. Users in
this community show higher engagement and greater "stickiness" than users who
consume any other category of content. Moreover, YouTube accounts for an
increasing fraction of these users' overall online news consumption. Finally,
while the size, intensity, and growth of this echo chamber present real
concerns, we find no evidence that they are caused by YouTube recommendations.
Rather, consumption of radical content on YouTube appears to reflect broader
patterns of news consumption across the web. Our results emphasize the
importance of measuring consumption directly rather than inferring it from
recommendations.Comment: 29 pages, 21 figures, 15 table